Krauss Matching Wildcards Algorithm
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, the Krauss wildcard-matching algorithm is a
pattern matching In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be ...
algorithm. Based on the wildcard syntax in common use, e.g. in the
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
command-line interface A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...
, the algorithm provides a non-
recursive Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
mechanism for matching patterns in software applications, based on syntax simpler than that typically offered by
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
s.


History

The algorithm is based on a history of development, correctness and performance testing, and programmer feedback that began with an unsuccessful search for a reliable non-recursive algorithm for matching wildcards. An initial algorithm, implemented in a single while loop, quickly prompted comments from software developers, leading to improvements. Ongoing comments and suggestions culminated in a revised algorithm still implemented in a single while loop but refined based on a collection of
test case In software engineering, a test case is a specification of the inputs, execution conditions, testing procedure, and expected results that define a single test to be executed to achieve a particular software testing objective, such as to exercise ...
s and a performance profiler. The experience tuning the single while loop using the profiler prompted development of a two-loop strategy that achieved further performance gains, particularly in situations involving empty input strings or input containing no wildcard characters. The two-loop algorithm is available for use by the
open-source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Op ...
development community, under the terms of the Apache License v. 2.0, and is accompanied by test case code.


Usage

The algorithm made available under the Apache license is implemented in both pointer-based
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
and portable C++ (implemented without pointers). The test case code, also available under the Apache license, can be applied to any algorithm that provides the pattern matching operations below. The implementation as coded is unable to handle multibyte character sets and poses problems when the text being searched may contain multiple incompatible character sets.


Pattern matching operations

The algorithm supports three pattern matching operations: * A one-to-one match is performed between the pattern and the source to be checked for a match, with the exception of asterisk ( *) or question mark ( ?) characters in the pattern. * An asterisk ( *) character matches any sequence of zero or more characters. * A question mark ( ?) character matches any single character.


Examples

* ''*foo*'' matches any string containing "foo". * ''mini*'' matches any string that begins with "mini" (including the string "mini" itself). * ''???*'' matches any string of three or more letters.


Applications

The original algorithm has been ported to the
DataFlex DataFlex is an object-oriented high-level programming language and a fourth generation visual tool 4GL for developing Windows, web and mobile software applications on one framework-based platform. It was introduced and developed by ''Data Access C ...
programming language by Larry Heiges for use with Data Access Worldwide code library. It has been posted on GitHub in modified form as part of a log file reader. The 2014 algorithm is part of the Unreal Model Viewer built into the Epic Games
Unreal Engine Unreal Engine (UE) is a 3D computer graphics game engine developed by Epic Games, first showcased in the 1998 first-person shooter game ''Unreal''. Initially developed for PC first-person shooters, it has since been used in a variety of genres ...
game engine A game engine is a software framework primarily designed for the development of video games and generally includes relevant libraries and support programs. The "engine" terminology is similar to the term "software engine" used in the software i ...
.


See also

*
pattern matching In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be ...
*
glob (programming) In computer programming, glob () patterns specify sets of filenames with wildcard characters. For example, the Unix Bash shell command mv *.txt textfiles/ moves (mv) all files with names ending in .txt from the current directory to the directory ...
*
wildmat wildmat is a pattern matching library developed by Rich Salz. Based on the wildcard syntax already used in the Bourne shell, wildmat provides a uniform mechanism for matching patterns across applications with simpler syntax than that typically ...


References

{{reflist Algorithms